Indexing and Mining Free Trees
نویسندگان
چکیده
Tree structures are used extensively in domains such as computational biology, pattern recognition, computer networks, and so on. In this paper, we present an indexing technique for free trees and apply this indexing technique to the problem of mining frequent subtrees. We first define a novel representation, the canonical form, for rooted trees and extend the definition to free trees. We also introduce another concept, the canonical string, as a simpler representation for free trees in their canonical forms. We then apply our tree indexing technique to the frequent subtree mining problem and present FreeTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of free trees. Our mining algorithm is a variation of the traditional a priori method for mining frequent itemsets. We study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from two real applications: a dataset of chemical compounds and a dataset of Internet multicast trees. The experiments show that our algorithm scales linearly in the cardinality of the database.
منابع مشابه
Discovering Rules using Fuzzy Decision Trees for better Visual Indexing based on Colors
This paper presents results, at an early stage of research work, of the use of fuzzy decision trees in a multimedia framework. We present the discovery of rules in three different indexing scenarios. These rules represent knowledge that can be interpreted as guidelines for the development of better indexing tools. We use a fuzzy decision tree algorithm to extract these rules (just) from color p...
متن کاملDiscovering knowledge for better video indexing based on colors
In this paper, we present the discovery of rules for different challenges encountered in video indexing. These rules should be considered as knowledge that can be used as a guideline for the development of better indexing tools. We use a fuzzy decision tree to extract the rules based on color proportions of key-frames extracted from one single video-news. Experimental results and comparisons wi...
متن کاملPerformance Evaluation of Parallel S
The S-tree is a dynamic height-balanced tree similar in structure to B + trees. S-trees store xed length bit-strings, which are called signatures. Signatures are used for indexing textbases, relational, object oriented and extensible databases as well as in data mining. In this article, methods of designing multi-disk B-trees are adapted to S-trees and new methods of parallelizing S-trees are d...
متن کاملPerformance Evaluation of Parallel S-Trees
The S-tree is a dynamic height-balanced tree similar in structure to B + trees. S-trees store xed length bit-strings, which are called signatures. Signatures are used for indexing textbases, relational, object oriented and extensible databases as well as in data mining. In this article, methods of designing multi-disk B-trees are adapted to S-trees and new methods of parallelizing S-trees are d...
متن کاملTerm Indexing for the LEO-II Prover
We present a new term indexing approach which shall support efficient automated theorem proving in classical higher order logic. Key features of our indexing method are a shared representation of terms, the use of partial syntax trees to speedup logical computations and indexing of subterm occurrences. For the implementation of explicit substitutions, additional support is offered by indexing o...
متن کامل